16 research outputs found
On the consistency of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea
which employs information theoretic concept in order to create a multithreshold
maximum margin model. In this paper we analyze its consistency over
multithreshold linear models and show that its objective function upper bounds
the amount of misclassified points in a similar manner like hinge loss does in
support vector machines. For further confirmation we also conduct some
numerical experiments on five datasets.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
Fast optimization of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a density based model
which searches for a linear projection maximizing the Cauchy-Schwarz Divergence
of dataset kernel density estimation. Despite its good empirical results, one
of its drawbacks is the optimization speed. In this paper we analyze how one
can speed it up through solving an approximate problem. We analyze two methods,
both similar to the approximate solutions of the Kernel Density Estimation
querying and provide adaptive schemes for selecting a crucial parameters based
on user-specified acceptable error. Furthermore we show how one can exploit
well known conjugate gradients and L-BFGS optimizers despite the fact that the
original optimization problem should be solved on the sphere. All above methods
and modifications are tested on 10 real life datasets from UCI repository to
confirm their practical usability.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
Extreme Entropy Machines: Robust information theoretic classification
Most of the existing classification methods are aimed at minimization of
empirical risk (through some simple point-based error measured with loss
function) with added regularization. We propose to approach this problem in a
more information theoretic way by investigating applicability of entropy
measures as a classification model objective function. We focus on quadratic
Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the
construction of Extreme Entropy Machines (EEM).
The main contribution of this paper is proposing a model based on the
information theoretic concepts which on the one hand shows new, entropic
perspective on known linear classifiers and on the other leads to a
construction of very robust method competetitive with the state of the art
non-information theoretic ones (including Support Vector Machines and Extreme
Learning Machines).
Evaluation on numerous problems spanning from small, simple ones from UCI
repository to the large (hundreads of thousands of samples) extremely
unbalanced (up to 100:1 classes' ratios) datasets shows wide applicability of
the EEM in real life problems and that it scales well
Analysis of Compounds Activity Concept Learned by SVM Using Robust Jaccard Based Low-dimensional Embedding
Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds’ representations in a low-dimensional real space using near neighbour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classification technique, based on Locally Sensitive ashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 different representations) while in the same time allows fast classification and efficient online training